Search CORE

162 research outputs found

The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective

Author: Nayak Richi
Senellart Pierre
Suchanek Fabian
Varde Aparna
Publication venue
Publication date: 01/01/2011
Field of study

The World Wide Web no longer consists just of HTML pages. Our work sheds light on a number of trends on the Internet that go beyond simple Web pages. The hidden Web provides a wealth of data in semi-structured form, accessible through Web forms and Web services. These services, as well as numerous other applications on the Web, commonly use XML, the eXtensible Markup Language. XML has become the lingua franca of the Internet that allows customized markups to be defined for specific domains. On top of XML, the Semantic Web grows as a common structured data source. In this work, we first explain each of these developments in detail. Using real-world examples from scientific domains of great interest today, we then demonstrate how these new developments can assist the managing, harvesting, and organization of data on the Web. On the way, we also illustrate the current research avenues in these domains. We believe that this effort would help bridge multiple database tracks, thereby attracting researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Montclair State University Digital Commons

INRIA a CCSD electronic archive server

Queensland University of Technology ePrints Archive

Hal-Diderot

HAL-Rennes 1

Open Digital Forms

Author: Le Hiep
Rebele Thomas
Suchanek Fabian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/08/2016
Field of study

International audienceThe maintenance of digital libraries often passes through physical paper forms. Such forms are tedious to handle for both senders and receivers. Several commercial solutions exist for the digitization of forms. However, most of them are proprietary, expensive, centralized, or require software installation. With this demo, we propose a free, secure, and lightweight framework for digital forms. It is based on HTML documents with embedded JavaScript, it uses exclusively open standards, and it does not require a centralized architecture. Our forms can be digitally signed with the OpenPGP standard, and they contain machine-readable RDFa. Thus, they allow for the semantic analysis, sharing, re-use, or merger of documents across users or institutions

Crossref

HAL Descartes

Emerging multidisciplinary research across database management systems

Author: Nica Anisoara
Suchanek Fabian
Varde Aparna
Publication venue
Publication date: 01/01/2000
Field of study

The database community is exploring more and more multidisciplinary avenues: Data semantics overlaps with ontology management; reasoning tasks venture into the domain of artificial intelligence; and data stream management and information retrieval shake hands, e.g., when processing Web click-streams. These new research avenues become evident, for example, in the topics that doctoral students choose for their dissertations. This paper surveys the emerging multidisciplinary research by doctoral students in database systems and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D. workshop at the International Conference on Information and Knowledge Management (CIKM). The topics addressed include ontology development, data streams, natural language processing, medical databases, green energy, cloud computing, and exploratory search. In addition to core ideas from the workshop, we list some open research questions in these multidisciplinary areas

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

CERN Document Server

Dissertations of the University of Groningen

Emerging multidisciplinary research across database management systems

Author: Nica Anisoara
Suchanek Fabian
Varde Aparna
Publication venue
Publication date: 01/01/2010
Field of study

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Montclair State University Digital Commons

INRIA a CCSD electronic archive server

HAL-Rennes 1

Ontology Alignment at the Instance and Schema Level

Author: Fabian M. Suchanek
Fabian M. Suchanek
Pierre Senellart
Pierre Senellart
Serge Abiteboul
Serge Abiteboul
Équipe-projet Webdam
Publication venue
Publication date: 01/01/2011
Field of study

We present PARIS, an approach for the automatic alignment of ontologies. PARIS aligns not only instances, but also relations and classes. Alignments at the instance-level cross-fertilize with alignments at the schema-level. Thereby, our system provides a truly holistic solution to the problem of ontology alignment. The heart of the approach is probabilistic. This allows PARIS to run without any parameter tuning. We demonstrate the efficiency of the algorithm and its precision through extensive experiments. In particular, we obtain a precision of around 90% in experiments with two of the world's largest ontologies.Comment: Technical Report at INRIA RT-040

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

The Locality and Symmetry of Positional Encodings

Author: Chen Lihu
Suchanek Fabian M.
Varoquaux Gaël
Publication venue
Publication date: 19/10/2023
Field of study

Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not fully understood, especially given recent findings that various positional encodings are insensitive to word order. In this work, we conduct a systematic study of positional encodings in \textbf{Bidirectional Masked Language Models} (BERT-style) , which complements existing work in three aspects: (1) We uncover the core function of PEs by identifying two common properties, Locality and Symmetry; (2) We show that the two properties are closely correlated with the performances of downstream tasks; (3) We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly. We believe that these results are the basis for developing better PEs for transformer-based language models. The code is available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}Comment: Long Paper in Findings of EMNLP2

arXiv.org e-Print Archive

ESTER: efficient search on text, entities, and relations

Author: Alexandru Chitea
Fabian Suchanek
Holger Bast
Ingmar Weber
Publication venue: SIGIR
Publication date: 01/01/2007
Field of study

We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER can answer basic SPARQL graphpattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB

CiteSeerX

MPG.PuRe

Knowledge harvesting from text and web sources

Author: Fabian Suchanek
Gerhard Weikum
Publication venue
Publication date: 11/04/2020
Field of study

Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, KnowItAll, Probase, ReadTheWeb, and YAGO, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating natural-language text, deep question answering, and semantic search for entities and relations in Web and enterprise data. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-theart methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications

CiteSeerX

Knowledge Bases in the Age of Big Data Analytics

Author: Suchanek Fabian M
Weikum Gerhard
Publication venue
Publication date: 01/01/2014
Field of study

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe